5 research outputs found

    Efficient multi-task based facial landmark and gesture detection in monocular images

    Get PDF
    [EN] The communication between persons includes several channels to exchange information between individuals. The non-verbal communication contains valuable information about the context of the conversation and it is a key element to understand the entire interaction. The facial expressions are a representative example of this kind of non-verbal communication and a valuable element to improve human-machine interaction interfaces. Using images captured by a monocular camera, automatic facial analysis systems can extract facial expressions to improve human-machine interactions. However, there are several technical factors to consider, including possible computational limitations (e.g. autonomous robots), or data throughput (e.g. centralized computation server). Considering the possible limitations, this work presents an efficient method to detect a set of 68 facial feature points and a set of key facial gestures at the same time. The output of this method includes valuable information to understand the context of communication and improve the response of automatic human-machine interaction systems

    Building synthetic simulated environments for configuring and training multi-camera systems for surveillance applications

    Get PDF
    [EN] Synthetic simulated environments are gaining popularity in the Deep Learning Era, as they can alleviate the effort and cost of two critical tasks to build multi-camera systems for surveillance applications: setting up the camera system to cover the use cases and generating the labeled dataset to train the required Deep Neural Networks (DNNs). However, there are no simulated environments ready to solve them for all kind of scenarios and use cases. Typically, ‘ad hoc’ environments are built, which cannot be easily applied to other contexts. In this work we present a methodology to build synthetic simulated environments with sufficient generality to be usable in different contexts, with little effort. Our methodology tackles the challenges of the appropriate parameterization of scene configurations, the strategies to generate randomly a wide and balanced range of situations of interest for training DNNs with synthetic data, and the quick image capturing from virtual cameras considering the rendering bottlenecks. We show a practical implementation example for the detection of incorrectly placed luggage in aircraft cabins, including the qualitative and quantitative analysis of the data generation process and its influence in a DNN training, and the required modifications to adapt it to other surveillance contexts.This work has received funding from the Clean Sky 2 Joint Undertaking under the European Union’s Horizon 2020 research and innovation program under grant agreement No. 865162, SmaCS (https://www.smacs.eu/

    On-demand serverless video surveillance with optimal deployment of deep neural networks

    Get PDF
    [EN] We present an approach to optimally deploy Deep Neural Networks (DNNs) in serverless cloud architectures. A serverless architecture allows running code in response to events, automatically managing the required computing resources. However, these resources have limitations in terms of execution environment (CPU only), cold starts, space, scalability, etc. These limitations hinder the deployment of DNNs, especially considering that fees are charged according to the employed resources and the computation time. Our deployment approach is comprised of multiple decoupled software layers that allow effectively managing multiple processes, such as business logic, data access, and computer vision algorithms that leverage DNN optimization techniques. Experimental results in AWS Lambda reveal its potential to build cost-effective ondemand serverless video surveillance systems.This work has been partially supported by the program ELKARTEK 2019 of the Basque Government under project AUTOLIB

    Designing Automated Deployment Strategies of Face Recognition Solutions in Heterogeneous IoT Platforms

    Get PDF
    In this paper, we tackle the problem of deploying face recognition (FR) solutions in heterogeneous Internet of Things (IoT) platforms. The main challenges are the optimal deployment of deep neural networks (DNNs) in the high variety of IoT devices (e.g., robots, tablets, smartphones, etc.), the secure management of biometric data while respecting the users’ privacy, and the design of appropriate user interaction with facial verification mechanisms for all kinds of users. We analyze different approaches to solving all these challenges and propose a knowledge-driven methodology for the automated deployment of DNN-based FR solutions in IoT devices, with the secure management of biometric data, and real-time feedback for improved interaction. We provide some practical examples and experimental results with state-of-the-art DNNs for FR in Intel’s and NVIDIA’s hardware platforms as IoT devices.This work was supported by the SHAPES project, which has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 857159, and in part by the Spanish Centre for the Development of Industrial Technology (CDTI) through the Project ÉGIDA—RED DE EXCELENCIA EN TECNOLOGIAS DE SEGURIDAD Y PRIVACIDAD under Grant CER20191012

    Markerless full-body human motion capture and combined motor action recognition for human-computer interaction

    Get PDF
    Typically, people interact with computers by means of devices such as the keyboard and the mouse. Computers can detect the events coming from these devices, such as pushing down or releasing isolated or combined keyboard and mouse buttons, or the mouse motions, and then react according to the interpretation assigned to them. This communication procedure has been satisfactorily used for a wide range of applications. However, this approach lacks naturalness with respect to face-to-face human communication. @@ This thesis project presents a method for markerless real-time capture and automatic interpretation of full-body human movements for human-computer interaction (HCI). @@ Three stages can be distinguished in order to reach this objective: (1) the markerless tracking of as many of the user’s body parts as possible, (2) the reconstruction of the kinematical 3D skeleton that represents the user’s pose from the tracked body parts, and (3) the recognition of the movement patterns in order to make the computer “understand” the user’s will and then react according to it. These three processes must be solved in real time in order to attain a satisfactory HCI. @@ The first stage can be solved by means of cameras focusing on the user and computer vision algorithms that extract and track the user’s relevant characteristics from the images. This project proposes a method that combines color probabilities and optical flow. The second one requires to situate the kinematical 3D skeleton in plausible biomechanical poses fitted to the detected body parts, considering previous poses in order to obtain smooth motions. This project proposes an analytic-iterative inverse kinematics method that situates the body parts sequentially, from the torso to the upper and lower limbs, taking into account the biomechanical limits and the most relevant collisions. Finally, the last stage requires analyzing which are the significant features of motion in order to interpret patterns with artificial intelligence techniques. This project proposes a method to automatically extract potential gestures from the data flow and then label them, allowing the performance of combined actions
    corecore